Extracting and Visualizing Quotations from News Wires

نویسندگان

  • Éric Villemonte de la Clergerie
  • Benoît Sagot
  • Rosa Stern
  • Pascal Denis
  • Gaëlle Recourcé
  • Victor Mignot
چکیده

We introduce SAPIENS, a platform for extracting quotations from news wires, associated with their author and context. The originality of SAPIENS is that it relies on a deep linguistic processing chain, which allows for extracting quotations with a wide coverage and an extended definition, including quotations which are only partially quotes-delimited verbatim transcripts. We describe the architecture of SAPIENS and how it was applied to process a corpus of French news wires from the AFP news agency.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Lexicon of French Quotation Verbs for Automatic Quotation Extraction

Quotation extraction is an important information extraction task, especially when dealing with news wires. Quotations can be found in various configurations. In this paper, we focus on direct quotations introduced by a parenthetical clause, headed by a “quotation verb”. Our study is based on a large French news wire corpus from the Agence France-Presse. We introduce and motivate an analysis at ...

متن کامل

Visualizing Topical Quotations Over Time to Understand News Discourse

We present the PICTOR browser, a visualization designed to facilitate the analysis of quotations about userspecified topics in large collections of news text. PICTOR focuses on quotations because they are a major vehicle of communication in the news genre. It extracts quotes from articles that match a user’s text query, and groups these quotes into “threads” that illustrate the development of s...

متن کامل

Automatically Detecting and Attributing Indirect Quotations

Direct quotations are used for opinion mining and information extraction as they have an easy to extract span and they can be attributed to a speaker with high accuracy. However, simply focusing on direct quotations ignores around half of all reported speech, which is in the form of indirect or mixed speech. This work presents the first large-scale experiments in indirect and mixed quotation ex...

متن کامل

Information Extraction and Interactive Visualization of Road Accident Related News

This paper describes a strategy of extracting information from raw data and visualizing them in web browser. Raw data are collected from newspaper. These raw data are in English language. By implementing text mining process specific information extracted and this process explained clearly. Derived information is specifically on road accident related news but raw data contains all kind of news. ...

متن کامل

Content Collection and Analysis in the Domain of Epidemiology

We describe a system that tracks the spread of epidemics by automatically extracting content from the Web. The system continuously monitors a large set of news sources, extracts information from new articles, and accumulates the extracted facts in a database in real time. The system provides functionality for visualizing results, as well as alerting capability. We present the current state of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009